Using Swarm Intelligence To Accelerate Pulsar Timing Analysis 
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We provide brief notes on a particle swarm-optimisation approach to constraining the proper- 
ties of a stochastic gravitational-wave background in the first International Pulsar Timing Array 
data-challenge. The technique employs many computational-agents which explore parameter space, 
remembering their most optimal positions and also sharing this information with all other agents. 
It is this sharing of information which accelerates the convergence of all agents to the global best-fit 
location in a very short number of iterations. Error estimates can also be provided by fitting a 
multivariate Gaussian to the recorded fitness of all visited points. 

PACS numbers: 



6 



>: 
ov 

m 
O 

(N 



INTRODUCTION 

The first direct detection of gravitational-waves (GWs) 
will be a major step forward in astrophysics. There are 
plans in place which will permit detector-coverage over 
a huge range of GW frequencies by the late 2020s. Ad- 
vanced ground-based interferometers, such as AdLIGO 
Q, AdVirgo 0] and KAGRA Q, should be operating at 
design sensitivity by the end of this decade, and will be 
sensitive in the range ~ 1 — 10 3 Hz. It is hoped that a 
space-based interferometer with arm-lengths of ~ 10 9 m, 
such as eLISA/NGO 0, will be operable by the end of 
the 2020s, and sensitive in the range ~ 0.1 — 100 mHz. 

If a GW crosses the path of an electromagnetic (EM) 
pulse it can induce a frequency shift of the signals which 
will be dependent on the amplitude of the two GW po- 
larisations, the angle between the GW-source and the 
EM-source, as well as the sky-location of the EM-source 
d-Q. Thus Sazhin [j| and Detweiler [§] independently 
described how low-frequency (1 — 100 nHz) GWs could be 
detected via their influence on the arrival-times of signals 
from precisely timed pulsars. This is made feasible by the 
often sub- /us level of timing-precision achieved through 
the measurements of millisecond pulsar-signal time-of- 
arrivals (TOAs). In this case, the range of frequency- 
sensitivity is set by the observational time-span (/i ow ~ 
1/T) and the cadence (/ high - 1/(2AT)). 

"Pulsar timing arrays" (PTAs) [t| allow us the op- 
portunity to use the Milky Way as a kpc-scale GW- 
detector, as tens of Galactic millisecond pulsars are 
observed over several years. The International Pulsar 
Timing Array (IPTA) [l(| consortium combines the ef- 
forts of the European Pulsar Timing Array (EPTA) [Tljj , 
the North American Nanohertz Observatory for Gravita- 
tional Waves (NANOGrav) 0, and the Parkes Pulsar 
Timing Array (PPTA) fll. I t recently initiated the first 
"IPTA Data Challenge^!] 5 the aim of this challenge 
was to test new and existing algorithms for the purpose 



of constraining the properties of a background of GWs 
using PTAs. 

It is not only GWs which may induce deviations of the 
TOAs. The dominant perturbation is caused by the de- 
terministic spindown of the pulsar itself, as its rotational 
energy is extracted to power the EM outflow. There are 
also stochastic contributions to the deviations caused by 
a variety of sources, including clock noise, receiver noise 
and variations of the dispersion measure of the interven- 
ing interstellar medium. These effects must be accounted 
for and removed from the TOAs to produce the timing 
residuals, which contain the influence due to all unmod- 
elled phenomena, including GWs. While there is a rich 
literature on the subject of the detection of single GW- 
sources using pulsar-timing (e.g., fl5l - 420j | ) . the holy grail 
of PTAs is a stochastic gravitational-wave background 
(GWB). 

An isotropic, stochastic GWB may consist of a super- 
position of many single sources which are not individ- 
ually resolvable. The largest contribution will likely be 
from a background induced by a cosmological population 
of inspiraling supermassive-black-hole-binary (SMBHB) 
systems, with typical masses ~ 10 4 — 10 10 M Q . 

The fractional energy-density of the Universe in a GW- 
background is usually given as, 



p cr it. dim f) 4 

where / is the observed GW- frequency, p cr i t . is the 
energy-density required for a flat Universe, and h c (f) is 
the characteristic strain of the GW-background in a fre- 
quency interval centred at /. 

The characteristic strain spectrum of a GW- 
background resulting from inspiraling binary systems 
is approximately h c (f) oc /~ 2 / 3 2ll424|. We can ap- 



proximate the characteristic strain spectrum of a GW- 
background from other sources as a power-law also. Some 
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measurable primordial background contributions may 



have a power-law index of —1 
ground from decaying cosmic strings |27H3 
—7/6 [31]. For most models of interest, we can describe 



while the back- 
may have 



an isotropic, stochastic GW-background by 32] 



describe deterministic contributions to the arrival times. 
The vector of measured arrival times will be composed 
of a deterministic and a stochastic contribution (from 
time-correlated stochastic signals which are modelled by 
a random Gaussian process), 



h c (f) = A 



yr 



(2) 



This characteristic strain spectrum is related to the 
one-sided power spectral density of the induced timing 
residuals by, 



S(f) = 



1 1 

12^2/3 



h c (f) 2 = 
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yr 



(3) 



where 7 = 3 — 2a. 



Hcllings and Downs 33fl developed a simple cross- 
correlation technique for pulsars affected by the same 
stochastic, isotropic GWB, showing that the cross- 
correlation of the induced timing-residuals has a distinc- 
tive angular signature dependant only on the angular- 
separation of the pulsars. This angular correlation is 
given by, 



<5? gp . (7) 
The stochastic process has auto-correlation, 



c l3 = (st^sfn, 



(8) 



'*3 \—% —3 
where the elements of the covariance matrix are 
parametrised by a set of parameters, <p. Using the 
Wiener-Khinchin theorem, we can then define the auto- 
correlation as the Fourier transform of the power spectral 
density, 



C(t) 



S(f)cos(fr)df, 



(9) 



where r = 27r|i, — tj |, and S(f) is the power spectral den- 
sity of the time-series St rsp . A closed- form expression for 
the auto-correlation of a time-series influenced by an un- 
derlying power-law PSD is given in van Haasteren et al. 
[36j |. and is used in the following. 
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(4) 



where x = (1 — cos# a b)/2, and 9 a b is the angular sep- 
aration of the pulsar sky-locations. 

We define the power-spectral density of uncorrelatcd 
red-timing noise affecting the pulsar TOAs as, 



S(f) = N? ed 
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(5) 



PULSAR TIMING ANALYSIS 



Observations of pulsars lead to measurements of the 
pulse TOAs. The emission-time of a pulse is given in 
terms of the observed TOA by El], 
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A, 



(6) 



where A is the transformation from the site TOAs 
to the Solar-system barycentre, Ajg accounts for the 
delaying-effects as the pulse propagates through the in- 
terstellar medium, and Ab converts to the pulsar-frame 
for binary pulsars. 

In the first IPTA data challenge [lj] the raw data is 
in the form of pulsar parameter files (".par") and timing 
files (".tim"). The parameter file contains first estimates 
of the pulsar timing-model parameters; these parameters 



Processing raw arrival-times 

The ".par" and ".tim" files are fed to the Tempo2 
software package 34, [3^, 37] which processes the raw 



arrival-times. A vector of "pre-fit" timing-residuals are 
computed using first guesses of the "m" timing-model 
parameters from the ".par" files i.e., /3o,i — > Pi- This first 
guess is usually precise enough so that a linear approxi- 
mation can be used in the TOA fitting procedure. In this 
linear approximation, the timing-residuals are described 
by, 

5t = 5P rt + MC, (10) 

where 5t prf are the pre-fit timing- residuals (length n), 
£ is the vector of deviations from the pre-fit parameters 
(length to) defined as £ a = /3 a — /3 , QJ and M is the (n x to) 
"design-matrix" , describing how the residuals depend on 
the timing-model parameters. Tempo2 does not take 
into account the possible time-correlated stochastic sig- 
nal in the TOAs, so will perform a weighted least-squares 
fit for the timing-model parameters. Hence it is possible 
that some of the time-correlated stochastic signal is ab- 
sorbed in this fitting procedure, which is undesirable. 

The Tempo2 analysis provides output-residuals and 
the design matrix. The design matrix describes the de- 
pendence of the timing residuals on the timing-model 
parameters. The output-residuals form the input data 
vector for further study. 
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Generalised least-squares (GLS) estimator of 
stochastic and deterministic parameters 

We now want to re-process the Tempo2 output- 
residuals to take into account the correlated stochastic 
signal affecting the pulse arrival times. We relate the 
Tempo2 output-residuals to the stochastic contribution 
to the residuals by a further fitting process, 



C/EV*. The matrix G can be pre-computed and stored 
in memory for use in each likelihood calculation. 

Equation (|14[) provides a robust, unbiased Bayesian 
framework for the search for correlated signals in PTAs, 
and is used with uniform priors as the optimisation func- 
tion in the following section. 

PARTICLE SWARM OPTIMISATION 



St = Sf ep + M£, 



(11) 



where, in this case, 5t refers to the output-residuals 
from Tempo2. 

We re-fit the timing-model for each pulsar, taking into 
account the possible contribution from a time-correlated 
stochastic process with covariance matrix C. This covari- 
ance matrix may contain contributions from the GWB, 
white-noise from TOA-errors, and possibly red-timing 
noise which is uncorrelated between different pulsars. 
The likelihood of measuring post-fit residuals, St, with 
linear parameters £ and stochastic parameters, 0, is, 



£(6t\tJ) = 



1 



^(27r)"detC 
1 



exp 



(~ (5t- Alt) 7 C- 1 (St- Alt) 



(12) 



This likelihood expression is effectively a GLS estima- 
tor, and is the basis for the framework used in these notes 
to study the first IPTA data challenge. 

If we assume flat priors on the timing-model 
parameters then these parameters can be analyti- 
cally marginalised over. The posterior distribution 
marginalised over timing- model parameters is [38j |. 



■. exp 



v/detC x det(Af T C- 1 M) 

(13) 

where C = C~ x - C~ X M (M T C- 1 M) 1 M T C' 1 . 
When dealing with large datasets and many pulsars, 
C naturally involves the multiplication and inversion of 
high dimensional matrices. 

Expression (|13[) can be written more compactly and in 
a way which is slightly faster to compute [39( : 



V / (27r)"-™det(G T CG) 
exp f-lsfG {G T CGy 1 G T 5i*J , (14) 

where G is the matrix of the final (n — m) columns 
of the matrix U in the SVD of the design matrix, M = 



Particle swarm optimisation (PSO) is a method of find- 
ing the global maximum/minimum of non- linear func- 
tions using a swarm methodology. This techni que orig- 
inally developed out of a social metaphor HUSH* at- 
tempting to describe the dynamics of birds flocking, fish 
schooling and, more generally, the theory of swarms. 

Swarm algori thms are becoming more popular in as- 
tronomy [42H44| , and, more specifically, in gravitational- 
wave astronomy 45l.l46|. However as far as we are aware, 
the problem has not yet been applied to searches for a 
stochastic gravitational wave background with PTAs. An 
important caveat is that PSO will return a global best-fit 
solution, but not error bars. The sampling of parameter 
space by particles will not necessarily be representative 
of the likelihood/posterior surface. 

A set of "particles" (computational agents or cores) 
are driven by "cognition" and "social" factors to explore 
a parameter space by carrying out random walks. Par- 
ticles remember their past positions and values of the 
fitness/optimisation function, and share this information 
with all other particles. Each particle then carries out 
a random walk through parameter space, also experienc- 
ing a tendency to move toward their personal optimal 
location and the global optimal location. 

First we define the terminology of PSO algorithms, us- 
ing the notation of Prasad and Souradeep [44J . We have 
particles (computer cores or threads) which have posi- 
tions, X, and velocities, V, where X-*(i) refers to the po- 
sition of particle j at iteration i. The fitness/optimisation 
function, J 7 , is used for searching for a global maxi- 
mum/minimum, where J- 3 (i) refers to the fitness of parti- 
cle j at iteration i. We either try to maximise J- = InP, 
or minimise T — xtfi = — 2 In P. In the following we 
maximise In P. 

Pbest is the maximum value of the fitness function for 
particle j up to the present iteration, N. 



Pbest 3 = Max{j rj = 0, . . . , N} , (15) 
where the location of Pbest- 7 is given by the vector P J , 



P J =X\i) if =Pbest J 



(16) 



Gbest is the largest of the Pbest values among all par- 
ticles. This value only changes when a particle finds a 
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TABLE I: The globally optimum positions and associated \a errors as found by the PSO algorithm. No values are given for the 



red-noise parameters in 0pen3 


since Af ret j 


was consistent with 
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A 1 HT 14 


7 iVred / ns 


Yred 


True 


5 


4.33 


5 


4.33 


1 


4.33 10.1 


1.7 


ML 


4.81 


4.41 


5.38 


4.33 


1.21 


4.07 




A(9 


0.179 


0.0821 


0.256 


0.0876 


0.121 


0.155 





new position which is more optimal than any position 
any particle has ever visited. 



Gbest = MaxjPbest-?, j = 0, . . ., N p } , (17) 

where N p is the number of particles. The location of 
Gbest is given by vector G, 



where the particle velocity is set to be ± Vm ax if the pro- 
posed velocity is greater than Vmax or less than — V max , 
respectively. 

The "reflecting wall" boundary condition is used for 
the particle positions, such that a particle reverses it's 
velocity component perpendicular to the boundary if it 
tries to cross the boundary. Thus, 



G = X j (i) if Pbest-? = Gbest. 



Particle dynamics 



(18) 



The particle positions and velocities are updated ac- 
cording to, 



V j {i) = -V j (i), 



(21) 



or X n 



max ^i- mln 

or less 



and the particle position is set to be X 
when the proposed position is greater than X, 
than X m - m , respectively. Our search range for the GWB 
parameters are log(A) £ [—20, —12] and 7 £ [1, 5]. 



Initial conditions 



X^i + l) =X 3 (i) + V 3 {i + l), 



V j (i + 1) =wV 3 {i) + ciui 



X j (i) 



c 2 u 2 



G-Xi(i) 
(19) 



where c%, c 2 are "acceleration co-efficients", w is the 
"inertia weight" , and u± , u 2 are uniform random numbers 
in the range [0,1]. 

The values of c\ , c 2 tune the contribution due to per- 
sonal and social learning respectively. The 'W term 
moves particles in a straight line, while the c\Ui[. . .] and 
c 2 u 2 [. . .] terms accelerate particles toward the location of 
Pbest and Gbest respectively. 

We use the values from the PSO Standard 2006 code 
47]; w = 1/(2 In (2)) = 0.72 and ci = c 2 = 0.5 + In (2) = 
1.193. 



Boundary conditions 

Particle velocities are capped to prevent them rapidly 
leaving the search space. The maximum velocity is typ- 
ically set to be proportional to the search range. We 
adopt, 



(20) 



Random positions and velocities are assigned to each 
particle at the beginning of the algorithm, 



X 3 (i = 0) = X min + u 
V*Xi = 0) = uV max , 
where u £ U [0, 1]. 



^max X-n 



(22) 



Termination criteria 

For the purposes of establishing a termination crite- 
rion for the algorithm, we treat particle trajectories like 
MCMC chains, even though the particle trajectories are 
coupled. Gelman-Rubin R statistics [H, S| are used to 
determine when the swarm has sufficiently converged on 
the global best-fit location. 

We use the potential scale reduction factor (PSRF), R, 
of Brooks and Gelman [lij]. If R > 1 then the trajectories 
are not yet close to the true global optimum position. We 
run our PSO trajectories until R has been less than 1.02 
for all parameters for at least 50 iterations. Analysis 
of all datasets finished in less than two hours with 80 
cores, although the global best-fit location was typically 
reached in less than 30 minutes. 
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Error estimates 

PSO is designed to explore the full parameter space 
(preventing all particles getting stuck in local max- 
ima/minima) and then rapidly converge to the global op- 
timal location. A binning of all points visited by particle 
trajectories will not reconstruct the posterior surface. 

We can approximate the posterior surface close to the 
global best-fit point as a Gaussian, 



after the challenge deadline has passed and the results 
are widely available. The values in both Open and 
Closed datasets compare very favourably with tradi- 
tional stochastic sampling techniques such as MCMC and 
Nested Sampling. This will be discussed in an article 
which is in preparation by the authors and will be sub- 
mitted after the deadline ends. However, the PSO error 
estimates may not accurately reflect confidence intervals 
if we have a significantly non-Gaussian posterior surface. 



P = j P exp^--A T C'Aj, (23) 
where A a = (9 a — G a ) /G a for parameter a , and C — 

c-\ 

The standard error, A9 a , in parameter a is given by, 

A0„ = VCZ x G Q = sJ{C'-^) aa x G a . (24) 

We can fit a paraboloid to Axg ff = —2 (InP — InPo) 
(where we take Pq to be the posterior at G) , and identify 
the fitting co-efficients with the elements of C . Hence, 
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AXc 2 



A T C'A. 



(25) 



We limit the fitting to a subset of points, where only 
points within a hypersphere of radius |A | < 0.3, and 
A^efi < 10 are considered. This limited the fitting to 
several thousand points. 



CONCLUSIONS 

The globally optimum positions found by the PSO 
trajectories and the associated la errors for all Open 
datasets are listed in Table HI In all cases, we have 36 
pulsars with 130 observations each, spanning over ap- 
proximately 5 years. The white-noise in each pulsar is 
read-in as the TOA-error. In Figure [T] we show how the 
global best-fit location of OpenI is updated with time 
for varying numbers of cores/threads. As confirmed with 
MCMC and Nested Sampling, we are dealing with a uni- 
modal posterior surface, so beyond a few tens of cores the 
gains in the speed at which the global best-fit location is 
reached are minimal. However, if we had a multimodal 
surface then it would be desirable to have many PSO tra- 
jectories to aggressively search the parameter space for 
the global best-fit location. We recommend a few tens of 
cores, not only as a compromise between speed at which 
the best-fit is reached and computational expenditure, 
but also to permit an accurate reconstruction of the pos- 
terior in the vicinity of the mode for error estimates. 

In the interest of fairness, we do not give results for the 
Closed datasets here, but will update this manuscript 
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